Decimation based algorithm to improve inference using the Pseudo-Likelihood 
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In this Letter we propose a new method to infer the topology of the interaction network in 
a pairwise model with Ising variables. By using the pseudo-likelihood method (PLM) at high 
temperature, it is generally possible to distinguish between zero and non-zero couplings, because a 
clear gap separate the two groups. However at lower temperatures the PLM is much less effective and 
the result depends on subjective choices, as the value of the ^i-regularizer and that of the threshold 
to separate non-zero couplings from null ones. We introduce a decimation procedure based on PLM, 
that recursively sets to zero the less significant couplings, until the variation of the pseudo-likelihood 
signals that relevant couplings are being removed. The new method is fully automatized and does 
not require any subjective choice by the user. Numerical tests shows that it performs better than 
standard PLM. 
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INTRODUCTION 



Recent years have seen a growing interest of the sta- 
tistical physics community in the so-called 'inverse Ising 
problem' [lHlf- Although the problem is known since 
a long time ago under the name of Boltzmann machine 
learning [8[, the renewed interest is linked to the large 
number of datasets coming from many different fields — 
e.g. biology, physics, neuroscience — that require a quan- 
titative description. Building reliable models for these 
datasets has become therefore a fundamental problem 
[9l4l2|. Given a dataset and a model, the inverse prob- 
lem aims to find the parameters of the model that best 
fit the data. Among all possible models, the Ising one, 
although very simple — it involves only pairwise interac- 
tions between discrete variables — can take into account 
a wide range of phenomena. It is thus natural to develop 
inference methods for this statistical model, as initially 
done by Boltzmann machine learning [8j]. 

A common approach in Bayesian inference is to infer 
the model couplings by maximizing the likelihood func- 
tion. Unfortunately, the likelihood depends on the parti- 
tion function which is extremely difficult to compute in 
general. In this Letter we will therefore use the pseudo- 
likelihood approximation 13|, Il4j to perform inference on 
Ising models of reasonable size. This method has at least 
three advantages. First, it is possible to minimize the 
pseudo-likelihood function (PLF) in polynomial time for 
any Ising model. Second, the method is known to be ex- 
act in the case of infinite sampling [2j . Finally, as we will 
see in this Letter, the PLF can be used as an indication 
of how good the reconstruction is. It will allow us to de- 
crease the number of model parameters progressively and 
to understand when it is worth stopping, thus leading to 
an accurate model selection algorithm. 

We begin by introducing the pseudo- likelihood method 
(PLM) that has been already tested in inferring finite di- 
mensional models and found to be very effective [2] . We 



also recall the £i-regularization extension and the thresh- 
olding procedure for discriminating non-zero couplings. 
We discuss the problem of very large coupling appearing 
in the PLM and how we solve this issue without using 
the £i-regularization. Finally, we describe the new deci- 
mation procedure to select accurately the model param- 
eters and we show that the PLF can be used as a reli- 
able likelihood. We conclude by showing that our new 
symmetrized PLM with decimation provides a very good 
inference of model couplings, better than standard PLM. 



THE PSEUDO-LIKELIHOOD METHOD 

The PLM is used in cases where the model parame- 
ters cannot be inferred correctly using mean-field meth- 
ods and where the system size is too large to be able to 
maximize the true likelihood. 

Let's consider an Ising model with probability measure 
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whose parameters {Jij,hi} we want to infer, given a set 
of M configurations {s_^}k=i...,M independently sam- 
pled from PBoitz(s)- We define respectively the mag- 
netizations and the correlations of the data as mP = 






m^mP . 



Bayesian inference prescribes to maximize with respect 
to the model parameters the likelihood of the data 

£ = P Y\ M<& + m?m?) + (3 Yhimf - log(Z) . (2) 



i<j 



The maximum of C corresponds to parameters { J*-, h*} 
matching the magnetizations and the correlations of the 
inferred model with those from the data 
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There are two general methods to find the parameters 
{J*j,h*} that fulfill these conditions. The first one is 
by directly maximizing C, but this requires to compute 
the partition function many times, which is impossible 
in practice. The second one, named Boltzmann machine 
learning [8j, computes the left hand sides of cqs. pi4[) by 
Monte Carlo methods and updates the parameters as 
Jf™ = ^( c d - C y({J° ld ,/i° ld })), where 77 is a learning 
factor. Again this requires to run long Monte Carlo sim- 
ulations evaluating the average values with a precision 
good enough to sufficient to employ (|3l4p . 

An important aspect in the inverse problem is there- 
fore to find an approximation that provides a reason- 
ably good estimate for the model parameters in a quick 
time (usually polynomial in the system size). A review of 
known approximations can be found in |15| , to which we 
should add the more recent methods, as those based on 
the Bethe approximation [3J, |j] , the adaptive cluster ex- 
pansion [l[ and the probabilistic flow method [16[ . In this 
Letter we compare our new decimation technique based 
on the PLM against the £i-regularization which is very 
much used to find the interaction topology |2|, ll4{ . Stan- 
dard mean-field method will not be considered as in gen- 
eral they perform weakly on finite dimensional systems, 
neither the adaptive cluster expansion nor the probabilis- 
tic flow method, since they strongly depends on several 
subjective choices (thresholds, dynamics, etc.). 

Instead of using the likelihood function, which is very 
hard to compute, we use the PLF, VC — J2 r ^n where 
the index r runs over all variables, the "local" likelihood 
functions are given by 
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and the conditional probability of variable s r given the 
rest of the system is 



P( s rk\ r ) = 1+ exp(— 2(3s r (h r + 2J JrjSj)) 



(6) 



The standard implementation of the PLM consists in 
maximizing each of the N local likelihood functions C r 
separetely, thus getting two different estimates for each 
coupling Jij : a first one J*f from the maximum of Ci and 

a second one J*- from the maximum of Cj (hereafter we 
assume hi — and Jy £ {0,1} for the ease of presen- 
tation). Since the Ising model has symmetric couplings, 
the final estimate for the coupling Jy is then obtained 
by taking the average, J*- = (Jfj + J*j)/2. 

This method has been confronted against mean-field 
methods for the SK model in [2| and it clearly estimates 
the couplings much better at low temperatures. For 
sparse models, its ability to infer correctly the interaction 
network (i.e. which couplings are non-zero) is largely im- 
proved by the use of the ^i-regularization pjj, thus max- 
imizing the local functions Cy — C r + A J^ ■ / r \J r j\, with 



a suitably chosen (and not too large) A regularizer. A 
further improvement in inferring the model topology has 
been achieved |2f by setting to zero all couplings whose 
estimate is below a threshold, | J*„ | < S (but the choice 
of S is delicate, as we discuss below). 

Using the standard PLM we observe that, in difficult 
situations (e.g. when M is not large enough and temper- 
ature is low) some couplings are largely overestimated, 
\J*j\ » 1. In those cases, the inferred couplings are not 

symmetric (J*- ^ J*-) and only one of the two estimates 
is very large. The origin of this problem is to be found 
in the data information content, that is sometimes very 
poor around some variable s, : in that case the estimates 
J*? are strongly unreliable (think e.g. to what happen if 
in the M samples Sj and its neighbors were almost always 
perfectly aligned). This problem is often partially solved 
by using the ^i-regularization: the regularization param- 
eter A adds a weight on non-zero couplings and therefore 
prevents a coupling for being too large. A drawback of 
this approach is the tendency to underestimate globally 
the couplings and is therefore not completely satisfying. 
In this Letter, we choose a different solution: by maxi- 
mizing the PLF VC (rather than the N functions C r sep- 
arately) we look for a compromise, where the estimate J* 
must be such that both d and Cj are reasonably large. 
The advantage of this maximization is that it provides a 
unique estimate for each coupling, which is in general of 
the right order of magnitude (unless the information con- 
tent of the data is poor in a wide region) . The PLF can 
be maximized by a standard Newton method, while for 
the ^i-regularized functions we use the one of Ref. [171 ] . 



INFERRING TOPOLOGY USING PLM 

In inference problems, the "model selection" is the abil- 
ity to reduce the number of model parameters in order 
to keep only the essential ones (by Occam's razor rule 
the simplest model fitting a dataset is the one to be pre- 
ferred). This model selection is both crucial to iden- 
tify the structure underlying a dataset (e.g. the interac- 
tion network or the model topology) and to improve the 
quality of the coupling estimates (relevant couplings are 
better inferred after excluding the insignificant couplings 
from the model). 

A common approach in the Ising inverse problem is to 
choose which are the null couplings by putting a cut-off 
on their estimated values. This is a reasonable choice as 
long as a clear gap separates the estimates of null cou- 
plings (Jo) from those of non-zero couplings (Ji), but 
it is not always the case. In the upper panel of Figfl] 
we show an easy case, where a gap exists between esti- 
mates of Jo (red points) and those of J\ (blue points). 
However at lower temperature (upper data curve in the 
lower panel of FigfT]) the gap is completely absent and 
a good (S-thresholding to split Jq and J\ is impossible. 





600 800 1000 1200 



200 400 600 800 1000 

Number of non-decimated couplings 



FIG. 1. (color online) One typical sample of a 2D ferromag- 
netic Ising model with 30% dilution (M = 4500). Top: PLM 
inferred couplings at j3 = 0.5 have a clear gap (blue crosses are 
Ji couplings). Bottom: PLM inferred couplings at = 0.9 
have no gap (upper data), while PLM+£i shows a gap but 
does not recover perfectly the topology (lower data, A = 0.01). 



FIG. 2. Same model as Fig. Q] with /3 = 0.9 and M = 4500. 
The tPLF increases as the number of non-decimated couplings 
is reduced until it exhibits a maximum. In this case the max- 
imum corresponds to a complete recovery of the graph topol- 
ogy and therefore to a small reconstruction error. The inset 
is a zoom on the maximum region. 



For sparse models (as the one in Fig. [1} the use of the 
£i-regularization may induce a new gap (see lower data 
curve in the lower panel of FigJTJ). However this £\- 
induced gap does not always separates correctly J\ from 
Jo couplings: many green points are above the gap in 
the lower panel of FigfT] Moreover the estimates of J\ 
are systematically smaller than the true values, and this 
is unavoidable when using the ^i-regularization. Finally, 
there is no consensus on how to choose the A value. 

We propose a new method for inferring non-zero cou- 
plings J i that solves all the above problems. Our idea 
is to recursively set to zero couplings which are esti- 
mated very small by the PLM: we always maximize VC 
so as to avoid too large couplings and the bias due to 
the £i-regularization. At each decimation step we set 
to zero a finite fraction p of the remaining couplings, 
as such the total number of steps is 0(logN) and the 
PLM+decimation algorithm is competitively fast. This 
fraction p is actually the only choice left to the user and 
the results are largely independent on it (in our tests we 
have used p < 0.05). Setting couplings to zero gradu- 
ally is equivalent to use an adaptive thresholding with 
a very small 8 value: so our new method should per- 
form better than any standard thresholding procedure, 
especially because it does not require the existence of a 
gap in the inferred couplings (thus avoiding the use of 
£i-regularization that produces biased estimates and a 
strong A-dependence). 

The stopping criterion for the decimation procedure is 
based on the behaviour of the PLF. Indeed, we expect 
that, as long as the decimation procedure sets to zero 
couplings in Jo which are unnecessary to fit the data, 
the PLF should not change significantly. On the con- 



trary, the pruning of a coupling in J\ should produce a 
drastic decrease in the PLF value. This expected behav- 
ior is confirmed by the numerical simulations. In prac- 
tice we would like to stop the decimation at the point 
where the PLF variation, AVC/An with An being the 
number decimated couplings in the last step, goes from 
'small' to 'large' values. To make these two adjective 
quantitative we can compute the overall mean PLF vari- 
ation during the decimation, that is the change in PLF 
between the fully connected model [where VC is max- 
imized over all the N(N — l)/2 possible couplings and 
takes value P£ m ax] and the model of independent vari- 
ables [no couplings left by the decimation and PLF equal 
to — JVlog(2)]. The mean PLF variation is thus equal 
to (7>£ max + N\og(2))2/(N(N - 1)) and we propose to 
stop the decimation where the PLF variation reaches this 
mean value, that should separate 'small' from 'large' PLF 
variations. In practice it is more convenient to define the 
stopping point as the maximum of the tilted PLF (tPLF) 



/t^ /'tilted -tt> r 



xPC max + (l-x)N\og(2), (7) 



where x is the fraction of non-decimated coupling. It is 
easy to check that •p£ tlltcd = both before starting the 
decimation (a; = 1) and on a model with no coupling 
(a; = 0). In the interval [0,1] a maximum appears if 
correlations are present in the dataset. 

In Fig.[5]we show, for a case where inference is difficult, 
the tPLF as a function of the number of non-decimated 
couplings and the corresponding error in inferring cou- 
plings, defined as 
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unachievable and in general we believe that optimizing 
over S should not produce any sensible improvement as 
long as S is correctly chosen in the gap (to this respect 
the improvement obtained in Ref. [2J makes us suspect 
that a too large S value was chosen based on the previ- 
ous knowledge of the topology) . So a fair comparison of 
our new method with the standard PLM should be made 
by choosing 5 < 10 -2 and the improvement is then very 
large. The stopping point selected by maximizing the 
tPLF is shown by a large dot in Fig. [3l and is indeed the 
closest to the upper right corner (full topology recover). 



CONCLUSION 



FIG. 3. The ROC graph for a typical sample of the 2D fer- 
romagnetic Ising model with 30% of dilution at /3 = 1.0 and 
M = 4500 (difficult case). The upper right corner corre- 
sponds to the perfect reconstruction. For PLM+^i curves are 
drawn by varying A and they almost coincide if the threshold 
8 is correctly chosen in the gap, 8 £ [10 -8 , 10 -2 ]. The better 
value 8 = 0.1 can not be chosen without previous knowledge 
about the topology. The PLM+decimation is clearly inferring 
topology better, even if PLM+^i is finely tuned over A and 8. 



The zoom in the inset clearly shows that the maximum in 
the tPLF does corresponds to the minimum in the infer- 
ence error. Please notice also as the density of data points 
along the curves changes, because we have decreased the 
value of p during the decimation in order to spend less 
time in the initial part (which is easy) and have denser 
points close to maximum (and thus improve its location). 
Our new inference algorithm (PLM+decimation) can 
be very well compared to the standard PLM with t\- 
regularization and ^-thresholding by plotting the corre- 
sponding ROC curves (see Fig. [3j) . Each ROC curve is 
obtained by plotting parametrically the true positive rate 
(i.e. couplings in J\ inferred as non-zero divided by the 
total number of couplings in J{) versus the true negative 
rate (i.e. couplings in Jo inferred as null divided by the 
total number of couplings in Jo). The ROC curve for 
an exact inference method run on noiseless data would 
pass through the upper right corner. In general an infer- 
ence method is better the larger the area below the ROC 
curve. In the present case, the ROC curves for the stan- 
dard PLM have been drawn by varying A at fixed <5, while 
for the PLM+decimation the ROC curve has been drawn 
by varying the fraction of decimated couplings. Clearly 
the new method is outperforming standard PLM, even if 
the latter is optimized over A and 5. To this respect, it 
is worth noticing that 5 values in the range [1CT 8 , 10~ 2 ] 
do actually fall in the gap (see lower panel of Fig. [Tj and 
lead to very similar ROC curves, while 5 — 0.1 is outside 
the gap and would be impossible to choose that value 
without knowing in advance the topology we are looking 
for. So the better ROC curve with S = 0.1 is practically 



We have presented a new method for inferring the in- 
teraction topology of an Ising model which is based on 
the PLM and a decimation procedure that recursively 
sets to zero couplings which are inferred as the weak- 
est. The procedure is fully automatized (apart from the 
choice of p which is mostly irrelevant for the results) 
and provides a unique answer to the inverse Ising prob- 
lem. Execution times are comparable to those of stan- 
dard PLM (i.e. polynomial in system size), apart from 
an extra 0(log N) multiplicative factor (but remind that 
maximization without the £i-regularizer is easier). As 
the standard PLM, also our new method is exact in the 
large M limit, and shows for finite (and small) values of 
M much better performances than standard PLM with 
£i-regularization and ^-thresholding (which was consid- 
ered among the best inference techniques available). 
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